Gromov-Hausdorff stability of linkage-based hierarchical clustering methods

نویسنده

  • A. Martínez-Pérez
چکیده

A hierarchical clustering method is stable if small perturbations on the data set produce small perturbations in the result. This perturbations are measured using the Gromov-Hausdorff metric. We study the problem of stability on linkage-based hierarchical clustering methods. We obtain that, under some basic conditions, standard linkage-based methods are semi-stable. This means that they are stable if the input data is close enough to an ultrametric space. We prove that, apart from exotic examples, introducing any unchaining condition in the algorithm always produces unstable methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Characterization, Stability and Convergence of Hierarchical Clustering Methods

We study hierarchical clustering schemes under an axiomatic view. We show that within this framework, one can prove a theorem analogous to one of Kleinberg (2002), in which one obtains an existence and uniqueness theorem instead of a non-existence result. We explore further properties of this unique scheme: stability and convergence are established. We represent dendrograms as ultrametric space...

متن کامل

Computing the Shape of Brain Networks Using Graph Filtration and Gromov-Hausdorff Metric

The difference between networks has been often assessed by the difference of global topological measures such as the clustering coefficient, degree distribution and modularity. In this paper, we introduce a new framework for measuring the network difference using the Gromov-Hausdorff (GH) distance, which is often used in shape analysis. In order to apply the GH distance, we define the shape of ...

متن کامل

Temporal Hierarchical Clustering

We study hierarchical clusterings of metric spaces that change over time. This is a natural geometric primitive for the analysis of dynamic data sets. Specifically, we introduce and study the problem of finding a temporally coherent sequence of hierarchical clusterings from a sequence of unlabeled point sets. We encode the clustering objective by embedding each point set into an ultrametric spa...

متن کامل

Choosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation

1- INTRODUCTION The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to ...

متن کامل

Improved Error Bounds for Tree Representations of Metric Spaces

Estimating optimal phylogenetic trees or hierarchical clustering trees from metric data is an important problem in evolutionary biology and data analysis. Intuitively, the goodness-of-fit of a metric space to a tree depends on its inherent treeness, as well as other metric properties such as intrinsic dimension. Existing algorithms for embedding metric spaces into tree metrics provide distortio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1311.5068  شماره 

صفحات  -

تاریخ انتشار 2013